Taiwan
Japan
Korea
Thailand
---
title: "2017台電能源永續黑客松 "
output:
flexdashboard::flex_dashboard:
source_code: embed
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(
message = FALSE,
warning = FALSE
)
library(flexdashboard)
library(data.table)
library(dplyr)
library(ggplot2)
library(plotly)
library(highcharter)
library(DT)
# 讀入分群完的原始資料
tp_cluster <- fread("processed_data/taipei456_2.csv")
# 雷達圖資料
cluster_rader<-read.csv("processed_data/radar_plot.csv",fileEncoding = 'utf8')
# 四國資料
asia <- fread("processed_data/Asia.csv")
```
Motivation {#motivation1 data-navmenu="Motivation"}
=====================================
Electricity Shortage is knocking our doors
In Taiwan, insufficient electricity supply is always an important issue. Especially in summer, we are usually under the risk of power outage. Sometimes, it really happened[(8/17 Power Outage in Taiwan)](https://udn.com/news/story/11419/2644282)In both Trends of Demand and Stability imply electricity shortage will be an even serious problem.
Electricity demand is ascending
In the past 4 years, the demand of electricity in Taiwan has increased about 8%. Furthermore, our government intends to shut down all the nuclear power plants by 2025. In fact, nuclear power plants account for 14% of total electricity supply.
Electricity supply becomes unstable
Operating reserve rate keeps decreasing and it hasn’t achieved the goal set by Taipower corp. 4 years in a row. Low operating reserve rate implies any unexpected power plant shut down Taiwan will suffer from power outage.
Current policies are paved with good intentions but inefficient
Although the government enforce a bunch of different policies to solve this problem, it isn’t effective at all. For example [Save the electricity on your own](http://energy-smartcity.energypark.org.tw/), [Countrywide Electricity saving completion.](http://energy-2016summer.energypark.org.tw/) However, governors didn’t clarify the reasons of wasting electricity for different regions. Without comprehending the actual reasons of wasting electricity, how can they come up with an appropriate policy to curb the electricity waste.
Our goal
The Electricity saving policies enforcing process can be broken down into 3 parts, Identify the regions wasting electricity, figure out the reasons they wasting, and Set up corresponding policies. Our product was designed to shorten the time consumed during this process and help governors apply the right policies on the right regions.
How we achieve our goal? U-Optimizer
To customize the policies by region is the key of achieving our goal. We utilized a lot of open data, Including demographics data, economic data, electricity usage data ….etc. Besides, we use both supervised and unsupervised machine learning methodologies to cluster the villages around Taiwan. In the end, we want to deliver a system which can assist governors to arrange the current policies to appropriate villages or set up a whole new policy for specific villages.
4 countries' MAP{#motivation2 data-navmenu="Motivation"}
=====================================
Column {data-width=500}
-------------------------------------
### Taiwan
Taiwan
### Japan
Japan
Column {data-width=500}
-------------------------------------
### Korea
Korea
### Thailand
Thailand
4 countries' Data {#motivation3 data-navmenu="Motivation"}
=====================================
```{r}
asia %>% dplyr::select(2,3,8,5,9,6,7,11) %>%
DT::datatable(
options = list(pageLength = 30))
```
Data {#data-describe data-navmenu="Analysis"}
=====================================
Taiwan Open data
- [Taiwan Power Company Open Data](http://www.taipower.com.tw/content/announcement/ann01.aspx?BType=31)
- [Taiwan Educational Level](http://data.gov.tw/node/8409)
- [Income Tax Data](http://data.gov.tw/node/17983)
- [Population Data](http://data.moi.gov.tw/MoiOD/Data/DataDetail.aspx?oid=F4478CE5-7A72-4B14-B91A-F4701758328F)
- [Household Data](http://data.moi.gov.tw/MoiOD/Data/DataDetail.aspx?oid=F4478CE5-7A72-4B14-B91A-F4701758328F)
- [Ranks of Total Retail Sales](https://moeagis.carto.com/viz/b5e9f4e8-dc7c-11e6-8815-0ef24382571b/public_map)
- [Registered Business Sectors](http://ronnywang-twcompany.s3-website-ap-northeast-1.amazonaws.com/index.html)
- [Housing Login Prices](http://plvr.land.moi.gov.tw/DownloadOpenData)
- [Park Data](https://sheethub.com/data.taipei.gov.tw/%E8%87%BA%E5%8C%97%E5%B8%82%E9%84%B0%E9%87%8C%E5%85%AC%E5%9C%92%E9%BB%9E%E4%BD%8D%E8%B3%87%E6%96%99)
- [電線桿 Data]()
- [Trash Data]()
- [行道樹 Data]()
---------------
Detail Description
- 使用2016年7月、8月的非營業電力資料分析
- 使用2016年教育程度的資料,合理推估2016年7月、8月的教育程度狀況
- 所得稅資料之涵蓋範圍為2013年,假設2013年與2016年之人口結構相似進行推估
- 人口統計資料之最新資料為2015年,合理假設2015年與2016年之人口結構相似,因此使用2015年推估2016年
- 使用2016第三季之住宅統計資料
- 使用2016零售業銷售金額之村里排名
- 累積至2017年7月之營業商家登記數
- 使用2014年-2016年之實價登錄資料
- 累積至2015年7月之公園數量與坪數
- 2016年電線桿分佈與數目
- 2016年垃圾量資料
- 2017年台北市行道樹量
Analysis Process {#analysis2 data-navmenu="Analysis"}
=====================================
Analysis Process
1. Cleane twelve data set separately.
2. Merge data that cleaned together.
3. Use "Greedy Search" that include "Loss fuction" and "K-means" to find out key features.
4. At the same time, cluster data into five groups.
5. Use electricity use per household on each cluster to find the target audiance.
Processed Data {#analysis1 data-navmenu="Analysis"}
=====================================
### Processed Data
```{r}
tp_cluster %>%
select(行政區域,人口數,戶均用電,青少年人口,壯年人口,老年人口,綜合所得總額,中位數,房價中位數,商家數,公園數,公園坪數,每戶平均人數,博士比例,碩士比例,大學比例,大學以下比例,平均屋齡,零售排名,有偶比例..., 每戶平均老年人口數.人.,無老年人口戶數比例...,三位以上老年人口戶數比例...,X1戶一宅.宅.) -> origin
colnames(origin) = c("行政區域", "人口數", "戶均用電", "青少年人口", "壯年人口", "老年人口", "綜合所得總額", "所得中位數","房價中位數", "總登記商家數", "公園數", "公園坪數", "戶均人數", "博士比例","碩士比例", "大學比例", "大學以下比例", "平均屋齡", "村里零售金額排名","有偶比例", "戶均老人數", "無老人戶數比例", "三以上老人戶數比", "1戶一宅")
origin %>%
DT::datatable(options = list(pageLength = 30)) %>%
formatRound(c(2,13:18,20,21),digits = 2)
```
Radar Data {#analysis3 data-navmenu="Analysis"}
=====================================
Column {data-width=700}
-------------------------------------
### Radar Data
```{r}
tp_cluster %>%
select(行政區域,分群,大學以下比例, 扶養比,無老年人口戶數比例...,三位以上老年人口戶數比例...,X1戶一宅.宅., 戶均用電) -> radar_data
names(radar_data) = c("Region","Cluster", "大學以下比例","扶養比","無老人戶數比","三以上老人戶數比","一戶一宅", "Electricity")
radar_data$Cluster = as.character(radar_data$Cluster)
radar_data[radar_data$Cluster == "1", "Cluster"] <- "Cluster1"
radar_data[radar_data$Cluster == "2", "Cluster"] <- "Cluster2"
radar_data[radar_data$Cluster == "3", "Cluster"] <- "Cluster3"
radar_data[radar_data$Cluster == "4", "Cluster"] <- "Cluster4"
radar_data[radar_data$Cluster == "5", "Cluster"] <- "Cluster5"
radar_data %>%
DT::datatable(
options = list(pageLength = 30)) %>% formatRound(3:9,digits = 3)
```
Column {data-width=300}
-------------------------------------
### Index Variables
1. 扶養比
扶養比 =(少年人口+老年人口)/壯年人口
- 少年人口:0~14歲的人口
- 壯年人口:15~64歲的人口
- 老年人口:65以上的人口
2. 大學以下比例
這個里中學歷僅到大學以下的人數比例
3. 無老年人口戶數比例
這個里中無老年人口戶數的比例
4. 三位以上老年人口戶數比例
這個里中有三位以上老人的戶數
5. 1戶一宅
這個裡的一戶一宅宅數
Cluster{#cluster data-navmenu="Cluster"}
=====================================
Column {.tabset .tabset-fade}
-------------------------------------
### 單身弱勢族群
```{r}
## color
col.raw <- c("#1d3156","#984ea3","#4daf4a","#ff7f00","#e41a1c","#377eb8")
## cluster 1 v.s median
highchart() %>%
hc_chart(polar = TRUE, type = "line") %>%
hc_title(text = "Cluster 1 : 單身弱勢族群 ") %>%
hc_xAxis(categories = cluster_rader$index,
tickmarkPlacement = 'on',
lineWidth = 0) %>%
hc_yAxis(gridLineInterpolation = 'polygon',
lineWidth = 0,
min = 0, max = 1) %>%
hc_series(
list(
name = "cluster 1 ",
data = cluster_rader$第一群,
pointPlacement = 'on',color=col.raw[2]),
list(
name = "median ",
data = cluster_rader$med,
pointPlacement = 'on',color=col.raw[1])
)
```
### 高年齡族群
```{r}
## cluster 2 v.s median
highchart() %>%
hc_chart(polar = TRUE, type = "line") %>%
hc_title(text = "Cluster 2: 高年齡族群") %>%
hc_xAxis(categories = cluster_rader$index,
tickmarkPlacement = 'on',
lineWidth = 0) %>%
hc_yAxis(gridLineInterpolation = 'polygon',
lineWidth = 0,
min = 0, max = 1) %>%
hc_series(
list(
name = "cluster 2 ",
data = cluster_rader$第二群,
pointPlacement = 'on',color=col.raw[3]),
list(
name = "median ",
data = cluster_rader$med,
pointPlacement = 'on',color=col.raw[1])
)
```
### 年輕勞力族群
```{r}
## cluster 3 v.s median
highchart() %>%
hc_chart(polar = TRUE, type = "line") %>%
hc_title(text = "Cluster 3 : 年輕勞力族群") %>%
hc_xAxis(categories = cluster_rader$index,
tickmarkPlacement = 'on',
lineWidth = 0) %>%
hc_yAxis(gridLineInterpolation = 'polygon',
lineWidth = 0,
min = 0, max = 1) %>%
hc_series(
list(
name = "cluster 3 ",
data = cluster_rader$第三群,
pointPlacement = 'on',color=col.raw[4]),
list(
name = "median ",
data = cluster_rader$med,
pointPlacement = 'on',color=col.raw[1])
)
```
### 核心家庭
```{r}
## cluster 4 v.s median
highchart() %>%
hc_chart(polar = TRUE, type = "line") %>%
hc_title(text = "Cluster 4 : 核心家庭") %>%
hc_xAxis(categories = cluster_rader$index,
tickmarkPlacement = 'on',
lineWidth = 0) %>%
hc_yAxis(gridLineInterpolation = 'polygon',
lineWidth = 0,
min = 0, max = 1) %>%
hc_series(
list(
name = "cluster 4 ",
data = cluster_rader$第四群,
pointPlacement = 'on',color=col.raw[5]),
list(
name = "median ",
data = cluster_rader$med,
pointPlacement = 'on',color=col.raw[1])
)
```
### 高教育族群
```{r}
## cluster 5 v.s median
highchart() %>%
hc_chart(polar = TRUE, type = "line") %>%
hc_title(text = "Cluster 5 : 高教育族群") %>%
hc_xAxis(categories = cluster_rader$index,
tickmarkPlacement = 'on',
lineWidth = 0) %>%
hc_yAxis(gridLineInterpolation = 'polygon',
lineWidth = 0,
min = 0, max = 1) %>%
hc_series(
list(
name = "cluster 1 ",
data = cluster_rader$第五群,
pointPlacement = 'on',color=col.raw[6]),
list(
name = "median ",
data = cluster_rader$med,
pointPlacement = 'on',color=col.raw[1])
)
```
### Summary
```{r}
radar_data %>% dplyr::select(1,2,8) %>%
group_by(Cluster) %>%
summarise(n = n(), mean=mean(Electricity),med=median(Electricity)) %>%
arrange(Cluster) %>%
rbind(summarise(Cluster = "Total",radar_data,n = n(), mean=mean(Electricity),med=median(Electricity)) ) %>%
datatable(options = list(pageLength = 6) ,colnames=c("Cluster", "Count", "Mean", "Median")) %>% formatRound(3:4,digits = 3)
```
Column {data-width=350}
-------------------------------------
### Outcome
Cluster1 : 單身弱勢族群
很明顯的教育程度大學以下比例相當高,且老人沒有太多扶養比相對較低的單身族群,有名的[一戶百口人的洲美里](http://www.chinatimes.com/newspapers/20150527000550-260107)在這一群中,這群的戶均用電最低。
Cluster2 : 高年齡族群
除了三位以上老人戶數比例偏高外,各項指標皆與總體中位數幾乎重疊,故比較起來為年齡偏高的族群。
Cluster3 :年輕勞力族群
無老人的比例偏高,扶養比偏低,有第二高的教育程度大學以下比例
Cluster4 :核心家庭
一戶一宅的比例最高,而無老人的比例最高,但扶養比表現正常,代表可能是一般常見的父母與小孩的核心家庭,是用電量是屬於偏高的族群
Cluster5 :高教育族群
教育程度是大學以下比例最低,無老人的比例是低的,且有最高的扶養比,代表應該有部分老人也有小孩,用電量是屬於偏高的族群。
Five Groups {#comparison_T data-navmenu="Cluster"}
=====================================
Electricity use of per household in five groups
```{r}
p <- plot_ly(radar_data %>% dplyr::select(1,2,8) , y = ~Electricity,
alpha = 0.1, boxpoints = "suspectedoutliers" )
p %>% add_boxplot(x = ~Cluster)
```
Radar chart {#comparison3 data-navmenu="Cluster"}
=====================================
### All radar chart
```{r}
## 推疊雷達圖
highchart() %>%
hc_chart(polar = TRUE, type = "line") %>%
hc_xAxis(categories = cluster_rader$index,
tickmarkPlacement = 'on',
lineWidth = 0) %>%
hc_yAxis(gridLineInterpolation = 'polygon',
lineWidth = 0,
min = 0, max = 1) %>%
hc_series(
list(
name = "cluster 1-單身弱勢族群",
data = cluster_rader$第一群,
pointPlacement = 'on',color=col.raw[2]),
list(
name = "cluster 2-高年齡族群",
data = cluster_rader$第二群,
pointPlacement = 'on',color=col.raw[3]),
list(
name = "cluster 3-年輕勞力族群",
data = cluster_rader$第三群,
pointPlacement = 'on',color=col.raw[4]),
list(
name = "cluster 4-核心家庭",
data = cluster_rader$第四群,
pointPlacement = 'on',color=col.raw[5]),
list(
name = "cluster 5-高教育族群",
data = cluster_rader$第五群,
pointPlacement = 'on',color=col.raw[6]),
list(
name = "Total median",
data = cluster_rader$med,
pointPlacement = 'on',color= col.raw[1])
)
```
ClusterMap{#clustermap}
=====================================
Column {data-width=600}
-----------------------------------------------------------------------
### Map
Column {.tabset .tabset-fade data-width=400}
-------------------------------------
### 單身弱勢族群
```{r}
radar_data %>% dplyr::select(1,2,8) %>%
filter(Cluster == "Cluster1") %>%
ggplot( aes(x=Electricity)) +
geom_histogram(binwidth=40, colour="black", fill="white")+
coord_cartesian(xlim = c(400,2510))+
labs(title ="Histogram of Single Vulnerable Group ", x = "Electricity utilization")#+
# geom_vline(data=cdat, aes(xintercept=rating.mean), linetype="dashed", size=1, colour="red")
```
### 高年齡族群
```{r}
radar_data %>% dplyr::select(1,2,8) %>%
filter(Cluster == "Cluster2") %>%
ggplot( aes(x=Electricity)) +
geom_histogram(binwidth=25, colour="black", fill="white")+
coord_cartesian(xlim = c(400,2510))+
labs(title ="Histogram of Senior Group ", x = "Electricity utilization")
```
### 年輕勞力族群
```{r}
radar_data %>% dplyr::select(1,2,8) %>%
filter(Cluster == "Cluster3") %>%
ggplot( aes(x=Electricity)) +
geom_histogram(binwidth=25, colour="black", fill="white")+
coord_cartesian(xlim = c(400,2510))+
labs(title ="Histogram of Young Working Group ", x = "Electricity utilization")
```
### 核心家庭
```{r}
radar_data %>% dplyr::select(1,2,8) %>%
filter(Cluster == "Cluster4") %>%
ggplot( aes(x=Electricity)) +
geom_histogram(binwidth=30, colour="black", fill="white")+
coord_cartesian(xlim = c(400,2510))+
labs(title ="Histogram of Nuclear Family ", x = "Electricity utilization")
```
### 高教育族群
```{r}
radar_data %>% dplyr::select(1,2,8) %>%
filter(Cluster == "Cluster5") %>%
ggplot( aes(x=Electricity)) +
geom_histogram(binwidth=30, colour="black", fill="white")+
coord_cartesian(xlim = c(400,2510))+
labs(title ="Histogram of High Educated Group", x = "Electricity utilization")
```
Waste of electricity{#waste}
=====================================
Column {.tabset .tabset-fade}
-------------------------------------
### Map
### TA
```{r}
tp_cluster %>% filter(離群==1) %>%
group_by(分群) %>%
dplyr::select(行政區域,戶均用電,每戶平均人數,扶養比,每戶平均老年人口數.人.,平均綜合所得,非營業用電_里變異,大學以下比例) %>%
arrange(分群) -> ta
names(ta) <- c("分群","行政區域","戶均用電","戶均人數","扶養比","戶均老人數","平均綜合所得","用電變異","大學以下比例" )
DT::datatable(ta,
options = list(pageLength = 21)) %>% formatRound(3:9,digits = 3)
```
Column {data-width=350}
-----------------------------------------------------------------------
### Outcome
每一群中浪費電的目標族群之特性
------
1. 單身弱勢族群
- 15個里
- 教育程度高
- 高收入
- 一宅超過三戶
- 一年中用電變異大
2. 高年齡族群
- 16個里
- 一宅戶數越少
- 一戶人數越多
- 一年中用電變異大
3. 年輕勞力族群
- 15個里
- 高收入
- 一宅戶數越少
- 一年中用電變異大
4. 核心家庭
- 6個里
- 教育程度高
- 高收入
- 一宅戶數越少
- 一年中用電變異大
5. 高教育族群
- 10個里
- 高收入
- 老人比例高
- 一年中用電變異大
Conclusion {#conclusion}
============================
**Efficiency Evaluation**
若浪費電的人可以達到自己群的平均,可節省台北 4% 以上的電力!
Sidebar {.sidebar}
============================
Hi everyone, We are **Life is struggle.**
__U-Optimizer__
主題:節能、開放創新
政府端
我們希望可以幫助政府制定有效率的節電策略,因此需要找到到底誰在浪費電,我們透過分群的方式找到用電行為相似的族群,再各群中找出用電量異常高於平均的里,最後利用決策樹找出浪費電的里有什麼樣的特徵。
使用者端
使用者可以使用我們的產品來督促自己是否為浪費電的人
------
__Members:__
1. [Peng-Wen,Lin (林芃彣 Nicole)-
NCCU Department of Statistics](https://www.facebook.com/profile.php?id=100000344369057)
2. [Pei Wen,Yang (楊佩雯 Penny)-
NCCU Department of Statistics](https://www.facebook.com/yangpenny0903?fref=ufi)
3. [Pei Shuan - Haung (黃培軒 Bacon)-
NCCU Department of Statistics](https://www.facebook.com/profile.php?id=100004119858241)
4. [Li-Jer, Lin (林立哲))-
CITIC Housing](https://www.facebook.com/sweetcow)